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R^latedAEElications 

^.^^>■^^ P-n. appU.a.on is reW . V.S. Paten. ApP«ca«o„ SeHa, No. 
^ ^^^^^ „^,^B^SE SIZER FOR NT SIZER 

~\ o ■ 1 , fi^ed , entitled 

5 SYSTEM- U.sNPatent Application Senal No. _ 

sxziNO serve\eor database management systems via user 

DEFmED WORKl\dS; U.S. Paten, Application Serial No . B'^" 

entitled COMB V« OF MASS STORAGE SIZE.. COMPARATOR, 
;;rUSEROEF:NEOW^.OAD SIZER, ANBOESIGN TRADEOFF TOO. 

,„ M ONE PACKAGE; U.S. PateVplication Serial No • f'^'i ' 

entttlea ALGORITHMS TO CAe\aT. MASS STORAGE REOU— T 

FOR NT SIZER; and U.S. Paten. AppVation Senal No. 

__ en..ledMETH0DOFCOMPAKIs\F0RC0MPUTERSVSTEMSANO 

^PARATUS THEREFOR, all of whicl, ate aiwd .o the assignee of the present 

,5 inventionandincorporatedhereinbyreference. \ 

Fieldofthejiwention 

present invention relates generally to conrpnters and software. More 
speciflcally,t.e present invention is related to database n^nagenrentsyste. (DBMS) 

,.ware and software. Tbepresentinventioninc.udessoftw.e.o„lsforde.er,nr„.ns 
,e hardware eonrponents required to handle workloads wbile adjnstins the 

components to remain within hardware utilization limits. 

pa-vr"""'^ "f 

Kelationa, databases eame into common use in computers over twenty years 
Oespite improvements in database software and new methodologies, rela.onal 
,3 ^sremamthemamstayofdatabasemanasementsystemsCDBMS). Hardware 



-1- 



,„..o.— — — — :::: 

L .eve,opea — .a.a.e™e„. .s»„s « ».e .o.e ope. .a 
. .... — ..a. ...... 

,.,„.eWo.a„.so«w.e..ao.. «.pa..PPO«a.«a..a.o 

v>ppnme more common. 

.....a„a.™e...e.sa,s„..„e.pa,a.e.«oeUe„.-s.^ 

L — ......*.e.Ma..e. — o...a..e„e 

L....a..:npa...a.,..e.*.«„a.eo«.W.ea™*.^^^^ 

exis«ngda«a,on.wi*upaa.esofe.«nsda.ana — otnewaa.. 

Le»e,ec».c — ..as_o...e..e™e.o...»^^^^ 

:,..a.o™ae...„.p.a..™. „ »e.e. a. a.— 

lesC.....pec...a,so.a.eU........a.eWwa.e.o.e. 

.pen. ..e..e......*o......-.^^ 



-2- 



.asea r^.y on *e capaHH.. of a ^ow„ sys.e.. « n.a, be *e case .a. a ,ve„ 
na^e DBMS se„e, is believed .o satisfy a cu.e« o. ft.«..e . and 

..e.p.Ccapabni.yof*a.DBMS serve, is ava«ab,e.o.*evenao. and/0. fto„ the 

,„Co.sani.a.ionda.base. A,no.accu..eperfoonance value cou.d,.„.heory,be 

, ,e.vedfton,aseries„fn,o«specificdescrip«onof*e«,ui,e.en.s. may be .i,e 

ease .a. .he use, „as a n,o,e specific desc.p.ion of the «cui— for a sysfen,, 

such as detailed ttansaction specific information. 

When evaiuafins such systems. it«ouidbedcsi,ab,e to a„o. for some matg-n 

or .eadroom in capacity. Ifasystemwas specified toruntoociose to capaci^.thentt 

,0 become bacUos,ed and deny service when bursts of activity exceeded the 

average worMoad. It wouidaiso be desirabic to (actor in hardware utilization >im,ts 

for specific h^dware components. An upper util^tion limit could reduce the 
„d of under-capacity and .be resulUn. bottleneck A lower utilization lim.t 
could reduce the likeliboodofover-capacity and the resulting excessive spend.ng. 

.«i..ion limits in determining hardware re,uirements. This may include allowing 
for Changes in the hardware utilizatton limits, with corresponding adjustments to the 
hardware requirements. 

The present invention includes methods and systems for determining *e 
Hardware re,mremen.s for a database management system In particular, dte presem 
invention includes methods and systems for determining the hardware required for an 

t.«, The hardware requirements are calculated SO 
on-line transaction processmg system. The harawar q 

as to remain within user supplied hardware utilization limits. 

V. A- »«t the server type, maximum processor 
2, In one illustrative embodiment, the server lyp 
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.e*o.s, *e ..abase re,u— a,e a,so inpu. .o ...e. an. U-e 

^ one p^sra™ according .o .he present — . ,he progran. «a>,.es 
hardware resource u«.i.«on U.i. .o defauH h„dware uU,i..ion UnrUs. The 
,o.«n, chains rhrou^pu. worUoad recu— fto» a huntan user and also 
o,rarns database re.u— .cm *e user. The prosra. «>en c.cu,a.s *e 
. Hardware resources required .o sa^s^V .e da.hase »a„a.enrenr sys.en> da.hase and 
.ansaeHon handHn, re,u«. The ca,cu,a«on derennines *e hardware 
soasrorcnra.wi*in.eusersuppHedu«U..ion,hn«s.T.eprosranr 

^en displays or otherwise oun,u.s *e hardware re,uiren,en.s .o .he user. The 
p^ogranrnray^enaccep. user Changes ei.her»*e workload re,uirenren.sor.«.e 

„ ,ardwareu«i.a.ion«n.i.s. The hardware resources required can *en he reca,cuU.ed, 

occur^ngdurinshursrsofachvi.. The ,o„er Unri. can serve .o avoid purchasing 

discrere hardware conrponen^suchas processors, n.en,ory,dis.dHves,andNICs can 
.eau.on,a.ica«. Changed inresponse.och.ginsrc<,uiren,en.andu«H..ion,in... 

BriefDescriEtionofti^ 
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database server; 

K^«3isahigUy«agramma.icv,ewofaB-«ee 



„f ,l,e mrthod used to calculate 
R^e 4 U a modified dataflow d,agram of the met 

„ based on a workload requirement specfled 
DBMS server hardware requrrements based 

explicitly onuser enteredtpmC; ^ ^^^^^^^^^ 

Figure 5 is a modified dataflow dtagram of 4= meth 

on and selectivity critena; and ^^^^^ 
p,,.e 6 is a flowchart of a process for obtarmng any ch.. 

.,,..ou,imitsandma.in.*n.sinanyresu.nshardwarere,uirement.. 

netaitedBescriEtio^^ 

„,„ «v^tem 20 including a server 22 
Figure 1 illustrates generally a database server system 
^ CKT2.andapnn.er2.forprosrammtng,display,mainteuance,an 

.pportedbyaCKT2. ^ ^^^^ several CPU sockets 30 and 

generalXnpu,/Outputuses.W.tluns 32 remaining 

3,..CF.socke.s30beiugPOPua^-- ^^^^^^^^ 
.,.for.turee„^^^^^^^ 

— - :;;^^_.,,„.„...orage,whichca„include 

n,eet the server's needs. A ^^^^^^^ 
,,a.vesoranyothertechno,ogycapab,eofho.dmg.eco.«^^ 

.d Several Network Interface Cards (NICs) 42 
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Networks, Ethernet, ar.d the Interr^et. ^^^^^ ^^^^ 

1- 1 are client computers 38- ^"^"^ 

processes usually run on a d,ffeven. m ^^^^^^ 
■ „„ the Miciosoft NT operating system and chents 
5 computer runnmg on the Mict 

Server 22 is preieiau j 

. ■ This allows extra CPUs, memory, NICs, an 

, ^,,„«,„exts..ocontalnandma„agedatal,ases,suc.as 
Servers sueh as server 22 often e 

^ cpveral columns 52 ana bcv 

.triable length 

tablestobestoredandmanaged. ^^^^^^^ 

tn he ordered accordmg to one 
I, is possible tor rows 54 to be or ^^^^ 

'•■---•■Trrr.-.-.---' 
••-"•■■'"'''■rr----"-— 



„ the addition, deletion, and change of data. 

^ — ly ^ „„,,,„e, tree 

..aso«wateU,...oaB-ttee.nde... B ^ 

vmed in the software arts. B-tree do 
-"^°-'°*°^°" t ;ee B-ttee,nd,ees.vet.advantaseof>,e.n...c. 

.„ehasanAVL«eota2-3 ,„«ins such as in a Hneat 

-----*^-'*™"^"":r3-canhentainta.edinah 

condition «iih the addttton of data to ^^^^^^ 
Maintain., the haiance of the .tee a„o«a,o.nseatch.n,e 

10 well. ,„~e for the indexing scheme, dne to 

onnMS mav use only a B-tree loi 
In practice, an RDBMS my . r ^e on any column tor 

„ WW An RDBMS may maintam a B-tree on a y 
,3 utility and flextbtUty. 

„«eh ordered searchinstnay he laterretiue ^ ^^^^^^ „,„,,„ents 

.dexeda.roachesthenumherofcol.nsm^^^^^^ 
, the indices themselves approach and pas 

..ertself -s, dte data storage re,u,rements of the . ^ ^^^^ 

..considered When determining the mass storage rcutrem 

Figure 3 illustrates a B-Tree 
.eel,n.SM.andSStonodes,0,.,and.re.e^^.ve^^^^^^^ 

levenareiUus.«tedasheingdouhlyhn.ed.ems vest. 

...craswellthroughuseofsuch ir.^ .e last levelin the tree. B-tree 

„ n0arepointedtobythelmksatleve,2. Level 



so .0. .eve. o, a .ee He., o. .eve, . .e sa. . .e .e .a„u. 
,.,.o..e„ee,as,eveM..e,eveU.««a.a..on.e«eew«,.«.«-o 

,„„ava,„esuc.asasoc«secu..™™.eHssea.c«fo,««.e.e..osucH 

.....e...a.,,eve,.....eve,a.....e.ea...«Ua.M.«^^ 
„„.sn.anan4a„Un.ea.<,se.e.asaao.«vU„.e<,Us..,Un.sn3a„.U3.n 



manner. 



.,„..,e„ee.o.e.e...a.po.„..o„o.s..ene.UeveUe,a.ea.o.o..^^ 

.a.e,a..opo.... — wMc.no.es.oso.o..va.es.ea.e,*a.^^^ 

.e va,ue, .es.c«ve.. B-T.es a. aa..a.. va. . w.. *e, .ave « 
.e.>.e.eve,. . so»e aa..a.e. te^e^ •*.ca, o.e.. — s , *e 

,„,„e,eve..as*e.co...— H„.e..o.e.e...n.ese...as. 
^ ,,„.,..H.«ac.ea..e.co..a.eeno...e..*„o..e,.Oneee.^. 
„,.,.e,eco.. .o.e.aa..ases,.e...»ea..— -o.e.e 
„^.eno.e...e.„.e,eveuo„..on.po..e.o.«eo.na™.e.s.^^^ 

,_...s„.e— s,*e.«...eveUon..n3„e.*e.*e.eeo.„n„.^^^^ 

^„ .,.„..o.e.co....e.e. .s.ea. a ».ae .e, . co„.ne., 

.a.c.o„«>a..e..oo...*e«c„.onn.«s.. .o.e.a.p,e,asea..o B-.« 
„..eao„«.n.e,us«.na„e™a.«.™on,vaso.aUecuH.n...e.„po 

co.p,e«o„. A.o.e. B.ee o, U^. op«™.a ,n.e. .a.. o„ soc.a 
™..e. . .p.. sea... .e «.o. o. .n.«s. n 

.3 scHe.e.aUeas.o.e™„.O.s...e.a«e..e«...eve..as.ee.eae.e. THe 



process. 

RDBMSs and selectable in other RDBMSs. 

^ea™ountofn,asss.o„,et«.u.edfo.a.ng.etal,lelsa.neti„nof.veta, 

„stota.e.e,ultea.notas..p.e — on.et„ws.eanac„— 
,,_1 reasons. Hrst, tHe coin™ s..s or ...s .a. .e variable. Second *e 
,se sl« enters into tbe calculation In a non-eontlnnons manner, as sonre database 

,.eboundaHes,wltbsonrespace.astedasaresn,.. ™rd.sonre space lnapase.s 
.,.,ae .or tUture expansion or reserved rornseabu^r space, as wbenre-ordenn. 

.^ebernsusedb. .be ROBMSitselUr for otber overhead. ,npartlcnlar,lnson,e 

_KBBMSs,annmber o.ro«s are set aside as non-.ab,e.btso.eKOBMSs,a 
^Con of each record IS set aside asnon-usable. AS prevlonsl,.nen«oned,tbes.e 
„,.He indices n,a.bealar«epor«on of table storaseeventbousbtbe data Itselfnrav 

.e stored within the Indices. Ml oftbe a— ned factors malces sl.l„S the 
„ recuired databases a complicated matter, as is dealt with below. 



p,^.e4i—a»e«,oa 200 for— ns*e.e,u.e. size ofaDBMS 

e™.o..e. *e TPC-C .e— U us. as *e — 

T.e ..C ... ...s . .e of New 0.e, T— P« 

„e. — s p« ...e a s... Sene«.es w.,e *e s.s.e™ . e« 
o.. — s ^es, pa..en., o.„.s.»s, aeH.e., a„a s,oc.,eve. .n ..s 
..^a**eKew0..e, — — app— .O.of.ewo.«^^^^ 

„ ,e.seco„aC-S).0. .PS .0. . .e .oU, nu..e. of — exe... p. 
second and is calculated using the equation: 
TPS = tpmC / 36 

in .tcent .10, and the .n,C ^.ite^en. Optional., the method 
„ ,nc,ude the tpntC Wlin. aHlit. of a .asel.ne s^stcnt, tndicated at .14 

,„^ses Of con^panson. Setvet t.e .OS n,a, also tefe„ed to a se^et fa™, as 
.Hese„ett.en,a..ncludeaconn.*s..en„ane.panda«es,stc„.otaanttl. 

„ —e svstetns .avins va..ns — e c— s. n 

20 memory, and mass Storage. 

TPC-C benchmark is an Online Transaction Processmg (OLTP) 
.enclm.ar.Ke.c— icsarea.g..e.enc.of^ 

,oport.on Of the transactions are inserts an. updates, and transaction response t.mes 
ond or less Thus, these characteristics require indexed database 
should be one second or less, inu , 

t^^e Further the database is expected to grow m 
25 access to minimize response time. Further, 
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proportion to the transaction rate. 

^ series of „eas— s were conduced .id, different numbers of processor 
configurations and prcessor speeds. Da.base si^s were proportionateiy increased 
wit, expected inc«ase in transaction dtroughput. A« benchmark tests were run to 
3 ,00% processor utilization' for the given configuration. Tlte achieved throughput 
(tpntC). mentory used, number of processors, processor speed, mass storage used, 
number of user simulated were recorded for each benchmark test. 

Using the results of these tests a matrix of *e above performance and 
configuration values for each configutation is built. Where configurations are missing 

,„ interpolation techniques are used to supply the missing values. 

For example the following setofpublished measurement results were used: 

Mhz nCPU tpmC Mi 

System ^ 10266 

Unisys /AquantaES2025 Server 550 2 10^ 

Unisys /AquantaES2043 Server 500 4 23 90 

Unisys /AqaantaES2043R server 500 4 23190 

Unisys /AquantaES2045 server 500 4 23852 

Unisys /AquantaES5045 500 4 

Unisys /AquantaES2085R server 550 8 37757 

Unisys /AquantaES5085R Server 550 8 40670 

This data was then averaged for each configuration and scaled to the lowest Mhz 
rating. This yielded the following table: 
Mass 

f° 4 2IS0 1659 18933 4096 

8 SS9 30,9 28660 4096 

processor busy ftom one .0 the maximum possible processors. For 500 Mh. 
processors, the results were as follows: 
Mass 

nCPU tpmC Storage No. Users Memory 



705 


8160 


1024 


1593 


18600 


4096 


1593 


18600 


4096 


1636 


19050 


4096 


1815 


19480 


4096 


3079 


30480 


4096 


3562 


32550 


4096 
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R1RQ 352 4,902 512 

2 Si 7.418 1.024 

3 S 1.150 13.175 1.024 

4 23640 1.659 18.933 4.096 

5 26:642 1.999 21,362 4 096 
fi 29644 2.339 23,791 4.096 
? I'ely 2.679 26.221 4.096 
\ 35649 3.019 28.650 4.096 



using scaHn.'tec» si^U. results can be calculated for t.e 550 Ml. 
processors. 

I„ .he me*od 200 the s^Ues. configuration satisfying bo* .he .pmC 2.2 
, and .he MaxProcess„«tiH.ation% 2.0 is se.eCed. T.e co„f.su«tion 

.e,ec.ed *en yields *e o..pu.s *CPUKe,uired 2,6 and Men.o.yRe<,uiren.en.s.MB 
220 The EffectiveCPUUtilization 218 is calculated as 

EffectiveCPUUtilization 218 (n,»CRe,uin=n.e„. 2,2) / (tpmC a. lOOV. for 
«sconfigu«.io„).Theoun,u.MassS..aseRe,ni«n.en.220 and«;se.Suppo«ed 

,„ .24 a« owained by in^rpelaUng between values fot ,he conf.gu«.ions in *e sa»e 

#UsersSupported 224 is calculated as 

#UsersSupported = (#UsersSuppo«ed wift , CPU) 
(Effec.iveCPUU.ilization2,S).((*Use.Suppottedw,th2CPUs)-(*Use.sSuppot.ed 

15 with , CPU)) 

A simiiai calculation is made for MassStorageRequirement 222. 
. i^«,e n-ass storage r^uired 222 can also be detemrined by nretitod 200. The 
^ SX recuired can be satisfied by adding dre appropriate si.e and number of 

.asssA----^-'''^^'"^^'-^"""''^""^^"""""'''" 
.0 nsing.hen,eth\describedinrelatedappUcationsU.S.Pa.e„tApplicationSerialNo. 

, entitled ALGORITHM TO CALCULATE MASS 

STORAGE REQ ^""^ " 
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, filed . 



.entitled DATABASE SIZER FOR NT SIZER 

tp,,.cAo-ea...e.et.I„onee™— ^ 

fiUeted 3m o„ *e ope„UnS s..e.n and aa,a..e .a..e,.en. .stetn- - 

o„e e— en., Ap- " ^ """" 

OBMS . .o.ae a .\ U. w«cH . se.ectaHe vta a .op ao^ «st to a stn.e 
tp.C CtaWse teco., wAcan .e tetnte. ..e • Wine" .ste„. THe tptnC va,ne 
t.o„*e„cotaean.ens\s..e.,ne.t.e.ase,.ne..e. . so»e 
en,W.,nen.s,a.own s..e.\nevenao..s selected as U.e—.s.e„ on 

„ne pa« o. ..e se.een. T.e te,.t\s .e .ed to select .t ano.e. .ste. 
tettnedt^e^tat... s^tent, .o„ a seco\endoHn anot^et pa„ onlte screen. THe 
A mmC otthe baseline system using 
tpmC of the target system can be compared toN^e »mC 

15 ratio 226. 

.eferrin, no. .o Figure S, in another embodiment of tbe mventton, mor 
.tailedinputsarepro^dedtotbeprogramtoallo^directcalculationoftbeesUm^^^^^ 

.,.em connguratton re,unemen.s. In Figure S, a method .30 is used to calcula. 

,3S can be ,uan.f.ed in units of to.l CPU utUi.tion and NIC utiH^tion. A user 
ae«ned transaction can .nclude a transaction name, indicated at T— , 
..ected e^ecntion rate of the tr— , indtcated a. T.nRate ... a .AHSpeed 
indicated at .ANSpeed and a bansac«on composttion, indica.d at 

..Composifon .a. T^nName^aOcanbeauser-detinedname .iven.oeachof.be 

„ user-denned— s. TxnRa.e is the e.pec.ed e.ecu.ion rate of .e user- 
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,o„ really in — P« 
aefined —on. Wca«y ^^^^^^^ ^^^^ 

Wudes the user specified SQL compos,.,on of fte tta 



below. 



von 264 can be detennined by a method 252. and includes me 
Tx„Composmon 264 can „f , hose SQL sB,teme„,s, 

•.• « including the numbers of those by 1- 
SQL statement composttrons 254. 

a„dsomeparametersinc,udedforeachSQLstatement.SQLstate 

le insert con.nbu.,ons2.S.dele.con..bu«ons2«.updatecon.rru^^^ 

l..lectco„tributions2S4.eachofwhichc..nc.ude the seconds onotalCPUtrm 

:;:rMethod252cansumto.etber.e.or.loadcont.butionso. 
both ways on the LAN. ^ additional input to 

of each SQL insert statements, indicated at 

. SQL.ame 2.0. and the number of identrca SQ n. ^^^^^^^^ 

1 fiif^ lareer the number of SQLs, mc & 
NoSQLs 272. In general, the larger 

J A, I AN For one CPU usage, a 
„f CPU usage and number -of bytes uanste^ed on the LAN. 

of CFU usage a , ^ TTor T AN usage, the 

inserted and the average column per row stze. 
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e.i'u ubag , further discussion, 

. ^ «theLAN The CPU usage is typically 
^ „„niber of bytes transferred on the LAIN . 
CPUusageand„un,.erofby ^^age. nun,b« of by.es 

,5 di— .seefteexperimentalresuttssection. 

Sa. se,ec. »e.o. .be w.«oa. con— o. eac SO. 

...le..b.bea„be.e...e..o..c..ea„a.e, — .SO^^^^^^ 

„0 *ee o.e. pa.™e.e. a. .be ™.be. o. .b.es ,o,ne. .n .e s . 
.O..e — .a.a.e. a. Se^ 

„„„..„rco,„..— ,«ca.e.a.Co,u— « 

Se,ec..e.bod«2 incudes a se.ec«vi.yc*ena,npu. 286. wb.cb 

.voe of SQL se.ec.s.a.eme„. specified. In a prefe^d embod,n,e„., 
O..PappUca«o....s3a»ee.bod..en.,..se,ec.— .e..n..of.o 



25 or I 
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.o-.a.e jo.. cn.e.a .S. U .e .... of ro.s se,ec.a .0™ *e 

„„.e.o.,e«.Me,«,.^e...ese,ec«..isassu.ea.o.ea„av«a,eoffo.fo. 

.e,ec.M..2S.is*en...e.ofva.esi.*eWHEKBc>auses.,..pe«ain.o*eo.., 

10 table. J 
SQL se,eo. .e*od 2S2 .ypioaUy senera.. output 284 in units of seconds 
CPUusa,ean.nun,..of.y.es—ea on«,eLAN.T.eCPUusaseis« 

. .„„p,ex func^on of .Ue seiec^vi.., inCua.. *e ...e ai— . Po. 
discussion, see the experimental results section. 

^position 2« W.C. can Itave units of seconas of CPU usa.e. 
„s.tion2.4can.en,ul.ip,ieaWtKetransact,o„rateTxnKate2a2.oa^^^^ 

a. .He transaction worUoad contribution, TxnWorMoaaContri.ution 258. Each 

„..oaaCon*u.tons 258 can he sumntea together to arrive ataftrs. estimate 

of total CPU utilization ana total NIC utilization. 

T,e value of the system conftsuration requirement can be obtainea ftom the 
« „ethoaS,3ten,Ke.uirementCalcu,aaons«4. ^e method 2,. uses as input the two 
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„«oadCo„— components 25S and *e input p— 
„esso—n .96 ana MaxNicU«.i.a«on 300. T.en,e.oa«4 produces 

,e systen, conflsu«.on «<,u— T.e sys.en. confiscation ,e,u.en,en. 
cons., of ^ outputs *CPme,uited 2>6, EtfectiveCPUUtiHzation «B, 
, Men,o^Re,uiten.e„ts 220, the LAN speed, indicated at Con,n^e<,ui.e»en. 304, the 
effective con»,u„ications transfer rate, indicated at Contn,sRate 306. and the 
effective utilization of d>e NIC, indtcated at EftectiveNICUtiUzation 308. 

Referring now to Figure 6. a method 300 is illustrated for calculating and 
.eca.culaaugtl,ere,uiredhardwareforaDBMSs.sten...naf.rsts.ep302,hardware 

,„ „«.ationli,nits can he iuittalized to default limit values. B>tanrples of hardware 
.auction limits, in one e^ntple, processor utili^tion limits includeadefaul. upper 
„«,lzationlimitsincludeadefaultupperuUlizatio„Umitof_35_percent. The upper 

.nrsts of processing worUoad and also allow for an. miscalculation of the required 
p„ hardware. In one example, hecause of *e dis^hution of messaging m 

cards is almost a ret,nirement for communications cards to reduce the 
otherwise resultingbacKup due to the transmissions and retransmissions recuiredhya 

„„1. provide forafloorheneathwhichutili^tion should not slip, toavoidpayingfor 

uiineeded excess capacity. 

step 304 includes ohtaining the worUoad re,uiremem. The wor«oad 
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„,ui,e™n, can be ob^ined as a ,n,C re<,uire.e„. as discussed with .spec, .o 
Pig„„ 4 The worWoad can aUo be supplied as a n,ore detailed inpu, as discussed 
with respect to Fi^te 5. The workload recuire»e„t can also be left a, the last value, 
with no changes. This can be useM when running the progran. through nrultiple 
. scenarios. In step 306, the hardware r«,uiren,ents can be calculated while regaining 
within the hardware utilisation limits., none embodiment, the hardware recuire^ents 

are detemrined interactively, with an initial de.ennina.ion of hardware made, followed 
by a conrparison against upper and lower utilization limi.s. In this embodhnent, the 
percent utilization of some hardware resources such a processors can tiren be 
„ estimated and compared against upper .d lower processor utilization limits. If the 
processor ,s over utilized Uren the number of processors c» he increased and the 
computations repeated. If the processor is underutilized, then the number of 
processors can he decreased and .he compu.ations repea.ed. In another example, tf 

,3 computationsrepeated.IftheNICsareunde™tilized.tirenthenumberofNlCscanbe 

decreased and the computations repeated. In some systems, tire speed of ti.e LAN can 
also he adjusted toprovide tor mcreased or decreased NlCutilization. 

U step 308, the hardware requirements can be output, typically displayed on a 
CRT and optionally printed. Hardware requirements can include *ose discussed witir 
.„ respect to Figure 4, as well as other hardware requirements. In a preferred 
embodiment, the number of processors used to populate a given server fam.ly are 
outpu. along with a number of NICs to be provided with the system. 

,„ step 310, the user can be prompt^ for any changes .o .he utilization limiis. 
some embodimenis, ti,e user can be prompted tor changes to workload 
.3 requirements as well. In either case, it no fcriher changes are desired, tire user can 
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„o..oaa .,u— can .a.e a. 3M and new — .,u— 

.ewea on .he con,p... seen and seen .o be ins. above an uppe. u.iU.«on H™t. 

eonfisura^on viewed. U ano*« exa.p,e o^ nse, .e LAN speed could be adjusted 
.p„a,d, and .heeffec. on the pe.cen.u.Ui.«on and „un,be.ofNICs viewed, in .h,s 

«y .he nun,he. of d.sc.e.e ha.dwa.e contponen. .ecuired such as NICs and 
n,ay he va.ed is .espouse .o vatyins wo.Uoad .e,ui.en.en. and vatying 
1 0 percent utilization limits. 

MeasurementBefiniS^^ 

A series of .4 SQL s.a.en,en.s we.e defined for measuremen. on NT Servers 
tuning d,e da^hase products SQL Server 6.5 and Oracie 8.04. Tltese SQL 

XPC-D database, were thought to encompass severa, possible scenarios andbus.ness 

cases for the OLTP environment. 

The series of SQL statements, consist of the following types: 

• Inserts 
20 • Deletes 

• Updates 

• Single table select: three selectivities 
. Two table join: three selectivities 

• Three table join: three selectivities 
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Tl,e intent behind to set of measurements was «,a, any OLTP ttansaction can 
be tl^ough. of as some combination of the above SQLs or extensions of them. Thus, 
the sizer will solicit the user tor the series of generic SQLs comprising each 
transaction. Based on curve fit estimates obtained from the measuremem results, one 
5 c».henestima,etheCPUusageforeachSQLandsubsequentlyeachttansaction.By 
applying workload metrics to each transaction, one can then calculate the CPU 
utilization and subsequently, the CPU requirement. 

The measurement of each SQL was run individually from a client, with no 
other load on the syst«n. Each test consisted of one or more executions of the same 
,0 SQL. A SQL may have been run several times in order to obtain a sufficiem sample 
size in the collection of resource usage data. Further, each execution of a given SQL 
was written to cause the RDBMS to re-parse the SQL before executing it. 

Scripts were developed to execute the series of SQLs and to cause certain 
RDBMS data to be collected and recorded. In order to implemem re-parsing for each 
„ SQL execution, the script was written accordingly for each RDBMS. Specifically, 
SQL Server allowed loopirg and also parsed for each instance within the loop 
whe^as Oracle appeared to parse only once within a loop; consequently, .he Oracle 
SQLs could not be looped, and the scripts were written accordingly. 

During each test, performance data was collected ftom fte NT Perfonnance 
» Monitor on the server as well as ftom the RDBMS data collecfion constmct (SQL 
Server's ISQL DBCC commands and Oracle's utlbstats/utlestats). This was after the 
NT server, the NT Perfonnance Monitor, and the RDBMS were conditioned prior to 
each test. 
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Measureme nt Results 

Although several memos were collected ftom each measurer^ent, the metrics 
of most relevat^ce to this version of the sizer .ere those that measured the CPU 
resource usage. This consisted of the following: 
5 • Elapsed time 

• Total per cent processor busy 

. Pet cent processor busy due to the RDBMS 

Other statistics, such as logical reads and writes, and physical reads and 
writes, were also collected. However, this data was not used in the cu^nt version of 
to the size, this was because .11 of the table accesses were indexed, and the number of 
logical lOs was always the number of B Tree levels plus data, consistent with an 

OLTP enviroTunent. 

T„e processor data used in the stz^r is the total processor usage, consisting of 
are RDBMS portion plus the kernel where the kernel activity is assumed ^ be 
,5 attributed to the lO activity. In the cu„ent version of the sizer the CPU usage due to 
lO activity is not calculated separately. 

Apphcant believes that the total CPU time should be approximately the same 
for a given RDBMS, independent of the number of processors configured. The 
rationale is that the same amount of work is being done but that the work may be 
.„ parsed out to a multiple number of processors; dre exception is that there is some 
overhead to parse the work out so that possibly the total CPU time might increase 
sHghtly as a funcUon of the number of processors configured. Results ftom most of 
the oracle measurements support this conjecmre. It is expected that the same is true 
for SQL Server 
25 Atialvsis an ^ r.nrve Fitting 
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Insert 

For inserts, a single record was inserted into a table 10 times followed by a 
commit. This sequence was repeated 582 times. The CPU time (on a 200 MHz 
processor system) attributed to inserts, and used in the sizer, was then calculated as 
5 Seconds per record inserted = (Total CPU Time) / 582 / 10 

Values used in the sizer are: 

SQL Server .004761 
Oracle -009 

Delete 

The deletes were measured during the process of reverting the database to its 
10 original state following the inserts. For the table used, an average of 3.88 records 
were deleted per delete SQL. In this case, 10 deletes were followed by a commit, and 
this process was repeated 150 times to return the table to its original condition. The 
CPU time attributed to deletes, and used in the sizer, was calculated as 

Seconds per delete SQL with 3.88 records deleted per SQL = (Total CPU 
15 time)/ 150/ 10 

Values used in the sizer are: 

SQLServer 0.015416 
Oracle 015 

Update 

Three sets of measurements were conducted for update SQLs: 
20 . A single SQL updated one record; this was followed by a commit 
. A single SQL updated five records; this was followed by a commit 
Each set of measurements was repeated 100 times. 

SQL Server: For the sizer, the CPU time attributed to SQL Server updates was 
calculated based on the first two points, specifically: 
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CPU Time = 0.0012491 * (No. records updated per SQL) + .0004697 seconds 
Oracle: The results show fair amount of reduction in overall CPU time for 
updates as a function of the number of CPUs. The curve fitting for CPU time 
attributed to Oracle updates was calculated based on both the number of CPUs and the 
number of records updated per SQL, specifically: 

CPU Time = -0.00248697 * (No. CPUs) + 0.00033591 * (No. records updated 
per SQL) + 0.00995405 seconds 

Currently the sizer pessimistically uses calculations based on 1 CPU. 

Sin gle Ta ^l^ ^e^e.c.t - Indexed 

Three single table SELECT SQLs were individually run. The WHERE 
clauses were set to produce selectivities of 2,588, 12,477, 24,896 records, 
respectively. These SQLs were defined so that the access method of the RDBMS was 
indexed, consistent with an OLTP enviromnent. For SQL Server this was via a 
cluster key, thus, eliminating the additional 10 to another leaf page if instead the 
index were non-clustered. For Oracle, this was via a standard index. The most 
suitable curve fit for both RDBMS seemed to be to divide the CPU time by the 
number of records selected, and then averaging that value. A value of .1 msec was 
given each RDBMS for parsing the SQL 

The resulting formulas used, and applied in the sizer, were the following: 
, SQL Server: (CPU Time) = .00006.73E * (Records Selected) + .0001 

Oracle: (CPU Time) = 0.00218 * (Records Selected) + .0001 
Note that the usual methods of curve fitting would apply to the seletivities 
noted, i.e., a very large number of records selected. Extrapolating to a few records 
selectivity (consistent with OLTP) produced either extremely large or extremely small 
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CPU time overheads. As a result, (he method of averaging time per record selected 

was chosen as a means of estimation. 

j^yo JcHIp Trtin - Indexed 

For the two table joi,>, an example of the business basis for the SQL SELECT 
, was to find the various sources (e.g., prices) of a given item or Hs. of items. In tHs 
case, .here were on the average four such sources for each item. Tbree SQL 
SELECTS were constructed to return the sources for one, five, and 10 items, 
respectively. Results of the measurements for each RDBMS are shown in the charts 
below. 

SQL Server: Curve fitting based on the template y=a-x"b produced the 

following results: 

(CPU Time) = .0014665 * (Items Selected) ^ .7552 

Oracle: Curve fitting based on the template y=a*x+b produced the following 

results: 

15 (CPU Time) = 0.00858 * (Items Selected) + 0.003236 

jhrf f. T;^ble .loin - Indexed 

For the three table join, an example of the business basis for the SQL SELECT 
was to detennine the status of orde. placed on certain dates from a selected segment 
ofthecustomerpopulation. For this particular database, the customer segment chosen 
.0 places about 20% of all of flte orders. Each order consists of approximately 4 items 
on the average. For this set of measurements three SQLs were defined to return stams 
on one, five, and 1 0 order dates. 

Curve fitting for both the SQL Server and Oracle RDBMS' was based on the 
template y = a * x b. For the Oracle case, the measurement results were first 
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averaged over ,he number of CPUs for each poin, ften linear curve fitting was 

applied to the averages. 

The resulting formulas used, and applied in the sizer, are the following: 
SQL Server: (CPU Time) = 0.001306 * (Selectivity on Outer Table) - 
5 0.00039 

Oracle: (CPU Time) = 0.002521 * (Selectivity on Outer Table) + 0.001 
T . AN USAGE 

T^is section provides the formulas used to calculate the LAN bytes passed per 
SQL statement. 

Using the Ethernet and TCP/IP protocol, a certain amount of bytes are passed 
with each message. Some of these values are as follows: 
FrameHeader = 72 bytes 

men a SQL arrives or wlren it is eitlier simply acknowledged or data is 
passed back, the following assumpHons are made as «> the ^ount of data transfe^ed 

15 across the LAN: 

SQLInsertlncomingData = 200 bytes 
SQLUpdatelncomingData = 200 bytes 
SQLDeletelncomingData = 200 bytes 
SQLSelectlncomingData = 200 bytes 
20 SQLAcknowledge = 1 00 bytes 

-n.e amount of data CommBytes transferred both ways across the LAN is 

estimated as follows per SQL: 
Inserts: 

CommBytes = 2 * FrameHeader + SQLInsertlncomingData + 
25 AverageColumnsPerRow + SQLAcknowledge 
Deletes: 
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CommBy.es > 2 • FrameHeadex + SQLDele,.IncomingDa.a + 

SQLAcknowledge 
Updates: 

CommBytes = 2 * FrameHeader + SQLUpdatelncomingData * NoRecords + 
SQLAcknowledge 
Selects: 

1 Tabk: Selected Bytes - Selectivity • BytesPerColumn • NoColumns 

2 Tables: Selected Bytes = 4 • Selectivity • BytesPetColumn • NoColumns 

3 Tables: Selected Bytes - 5 • Selectivity * BytesPetCo.umn • NoColumtts 
CommBytes - FtameHeader + SQLSelectlncomingData 

DO While Select^iBytes > 1500 ' too big for a frame, break up mto 1500 



bytes 



CommBytes = CommBytes + TCPIPHdr + 1500 
SelectedBytes = SelectedBytes - 1500 

Loop 

If SelectedBytes > 0 Then 

CommBytes = CommBytes + TCPIPHdr + SelectedBytes 

End If 



ADROOM CAT CT TT ATIONS 
This SECTION describes a method for calculating the system configuration 

requirement and the headroom. 
Processor P equirement 

25 Define 

TxnRateJ = transaction rate of transaction I 

CPUTimeJ = total CPUTime used to execute one instance of transaction I 
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TxnRate_T = sum of TxnRateJ over all transactions 

For each transaction, an input to method 294 is the transaction's effective 
contribution to CPU utilization, expressed as 

(CPUTimeJ) * (TxnRateJ) 

5 Define 

UtiLlCPU = sum of (CPUTimeJ) * (TxnRateJ) over all transactions 

So UtiLlCPU is defined as the effective utilization of a single CPU. Note 
that this value can exceed 100%, implying that more than one CPU is required to 
service the transaction rate. 
10 One method to determine the CPU requirement NoCPUs is as follows: 

NoCPUs = CEILING(Util_lCPU/100) 
While (Util_lCPU/100)/NoCPUs > MaxProcessorUtilization 
NoCPUs = NoCPUs +1 
Wend 

15 We note, however, that adding multiple processors does not increase 

performance with 100% efficiency. Thus, for example, by doubling the number of 
processors, we achieve anywhere from 150% to I8O0/0 as much performance, rather 
than 2000/0. Much of this is due to a combination of the instruction speed slows due to 
locality of reference and the number of instructions per transaction increase due to 
20 increased housekeeping overhead. 

The service time per transaction is the quotient (instructions per transaction) / 
(instructions per second). It has been found in snrdies that the service time per 
transaction is a linear toction of the number of processors, i.e.. service time per 
transaction, n processors = (service time. 1 processor) ♦ ( a ' n + b) where a and b are 



constants determined ftom analysis of measurement results. Thus, the transaction rate 
is proportional to the reciprocal, that is, 
transaction rate ~l/(a*n + b) 

For this version of the sizer, the values a, b are: 
5 a = 0.028982 
b = 0.973703 

The above algorithm is then modified as follows: 

Define 

NoCPUs = CEILING(Util_l CPU/1 00) 
10 CPUSvc_lx = Util_l CPU / Txn_Rate_T 
bResult = False 
While Not bResult 

temp = NoCPUs / (a * NoCPUs + b) * MaxCPUUtilization 
If temp < Util_lCPU Then 
15 NoCPUs = NoCPUs + 1 

Else 

bResult = True 

EffectiveCPUUtihzation = Ceiling((a * NoCPUs + b) * UtiLlCPU / NoCPU * 100) 
End If 
20 Wend 

Using this method, if the selected processor type does not meet the transaction 
method, then the next faster processor is selected, and the above process is repeated. 
The headroom algorithm ensures the following: 
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The EffectiveCPUUtilization in the above calculations does not exceed the 
MaxCPUUtilization, thus allowing the user to select a MaxCPUUtilization which is 
lower than required for good performance, thus allowing "headroom" for growth. 

If the MaxCPUUtilization criteria is not achievable, then the next faster CPU 
5 is selected. 

LAN Speed Requirement 

The LAN speed requirement is determined in a maimer similar to that described 

above. 

Define 

10 NIC_Speed(i) = bandwidth of NIC card i, expressed in megabits per second; values 
are 10, 100, 1000 

NICType = bandwidth of NIC card selected 
CommReq = bytes per second required for all of the transactions 
MaxNICUtil(i) = Recommended maximum utilization of NIC card i 

15 NICType = 0 

For i = 1 To Number of Card Types, starting with lowest capacity 
Temp = NIC_Speed(i) * MaxNICUtil(i) / 8 / 1000000 
If CommReq <= Temp Then 

NICType = NIC_Speed(i) 
20 Exit For 

End If 

Next 

IfNICType = 0 

Message: "The communications requirement exceeds the effective 
25 maximum imposed by the utilization limit for all 
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LAN speeds." 

End If 

This algorithm also demonstrates the headroom algorithm. 
\/fpmr.T y Calculations 

5 Because the database servers are of the enterprise quality, the amount of 

memory required is generally very high. As a rule of thumb the memory requirement 
has been specified as an additional 512 MB per processor up to the limit imposed by 
the machine and the operating system. 

Numerous advantages of the invention covered by this document have been 
10 set forth in the foregoing ' description. It will be understood, however, that this 
disclosure is, in many respects, only illustrative. Changes may be made in details, 
particularly in matters of shape, size, and arrangement of parts without exceeding the 
scope of the invention. The invention's scope is, of course, defined in the language in 
which the appended claims are expressed. 
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